PyDigger - unearthing stuff about Python


NameVersionSummarydate
llm-markdownify 0.2.1 Convert PDFs, images to high-quality Markdown using Vision LLMs. 2025-08-09 14:58:17
docstrange 1.1.2 Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR. 2025-08-07 13:45:30
vietcombank-captcha 0.1.0 Lightweight CAPTCHA predictor for Vietcombank using ONNX 2025-08-06 04:43:20
aspose-total-net 25.7.0 Aspose.Total for Python via .NET is a Document Processing python class library that allows developers to work with Microsoft Word®, Microsoft PowerPoint®, Microsoft Outlook®, OpenOffice®, & 3D file formats without needing Office Automation. 2025-08-05 23:32:27
marker-pdf 1.8.3 Convert documents to markdown with high speed and accuracy. 2025-08-04 18:18:40
kreuzberg 3.10.1 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-07-31 11:54:20
huaweicloudsdkocr 3.1.160 OCR 2025-07-31 09:51:16
ai-resume-parser 1.0.6 AI-powered resume parser with parallel processing for multiple file formats (PDF, DOCX, images, etc.) 2025-07-29 23:13:04
document-data-extractor 1.0.4 Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract 2025-07-29 08:25:56
dedoc 2.4 Extract content and logical tree structure from textual documents 2025-07-28 09:47:38
cloudflare-peek 0.1.0 A Python utility for scraping Cloudflare-protected websites using screenshot + OCR fallback 2025-07-27 16:41:12
cleanit 0.4.9 Subtitles extremely clean 2025-07-26 19:02:05
llm-data-converter 2.2.0 Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract 2025-07-25 13:32:07
nanonets-extractor 0.1.4 A unified document extraction library supporting local CPU, GPU, and cloud processing 2025-07-23 11:17:54
invoice-ocr-mcp 1.0.4 企业发票OCR识别MCP服务器 - 基于ModelScope的专业发票识别解决方案 2025-07-17 07:23:13
mseep-kreuzberg 3.8.2 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-07-17 03:32:28
mpxpy 0.0.17 Official Mathpix client for Python 2025-07-16 19:30:22
django-ocr_translate 0.6.3 Django app for OCR and translation 2025-07-16 11:42:24
SpectrePDF 0.2.1 A tool for processing and redacting PDFs based on target words using OCR. 2025-07-14 17:57:14
docforge 0.1.0 Forge perfect documents from any format with precision, power, and simplicity 2025-07-13 22:29:47
hourdayweektotal
55171610520309028
Elapsed time: 3.95942s